Streaming Prefetch

نویسنده

  • Olivier Temam
چکیده

In most commercial processors, data prefetching has been disregarded as a potentially eeective solution to hide cache misses, multi-level caches being widely preferred. However, multi-level caches are mostly eeective at removing capacity and connict misses, while prefetching is particularly eecient for removing compulsory misses, especially in the regular accesses found in numerical codes. One of the main aws of prefetching which stronly limits its popularity in current processors is that it can potentially degrade global cache performance. Wrong address predictions is the rst cause of cache pollution as well as additional memory requests. All existing prefetch schemes are impaired by wrong predictions because they speculate on the next address to be referenced. In this paper, we show that all required informations to avoid the speculative aspect of prefetching can be easily obtained from the compiler, resulting in nearly no wrong predictions. Even when address prediction is awless, prefetching can be hazardous to cache because cache checks (required before sending a prefetch request to limit memory traac) and cache reloads of incoming prefetch requests can result in cache stalls and thus processor stalls, particularly in superscalar processors where the cache can be accessed every cycle. In this paper, we show that addressing these implementation issues can make a prefetching scheme nearly transparent to normal cache operations. We have combined software-assisted address prediction with dedicated hardware support and obtained a prefetching scheme called streaming prefetch where data can ow through the cache nearly without disruption.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating Shared Memory Tree Prefetching within Multimedia NoC Architectures

This paper provides further evaluation of the proposed hardware prefetch unit for the Blueshell NoC. This utilises a separate shared memory tree (Bluetree) for connecting CPUs to external memory. The tree is supplemented with a Prefetch Unit next to external memory. Prefetching is carried out in a streaming manner, with prefetch distance being varied between 1 and 4. Whilst previous work has su...

متن کامل

Understanding and Efficiently Servicing HTTP Streaming Video Workloads

Live and on-demand video streaming has emerged as the most popular application for the Internet. One reason for this success is the pragmatic decision to use HTTP to deliver video content. However, while all web servers are capable of servicing HTTP streaming video workloads, web servers were not originally designed or optimized for video workloads. Web server research has concentrated on reque...

متن کامل

Optimized adaptation of video streams in streaming servers

In this paper we investigate stream adaptation policies for video streaming servers which use smooth TCP-friendly congestion controls to compute the sending bitrate. We seek an optimal adaptation policy which maximizes the perceived quality at the client side and tries to avoid an empty prefetch buffer at the client side. In order to do so, we first develop an optimization framework for our pro...

متن کامل

A prefetching protocol for continuous media streaming in wireless environments

Streaming of continuous media over wireless links is a notoriously difficult problem. This is due to the stringent quality of service (QoS) requirements of continuous media and the unreliability of wireless links. We develop a streaming protocol for the real-time delivery of prerecorded continuous media from (to) a central base station to (from) multiple wireless clients within a wireless cell....

متن کامل

Layer thickness in congestion-controlled scalable video

We address the problem of the proper choice of the thickness of pre-encoded video layers in congestion-controlled streaming applications. While congestion control permits to distribute the network resources in a fair manner among the different video sessions, it generally imposes an adaptation of the streaming rate when the playback delay is constrained. This can be achieved by adding or droppi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996